19 research outputs found

    Using accelerators to speed up scientific and engineering codes: perspectives and problems

    Get PDF
    Accelerators are quickly emerging as the leading technology to further boost computing performances; their main feature is a massively parallel on-chip architecture. NVIDIA and AMD GPUs and the Intel Xeon-Phi are examples of accelerators available today. Accelerators are power-efficient and deliver up to one order of magnitude more peak performance than traditional CPUs. However, existing codes for traditional CPUs require substantial changes to run efficiently on accelerators, including rewriting with specific programming languages. In this contribution we present our experience in porting large codes to NVIDIA GPU and Intel Xeon-Phi accelerators. Our reference application is a CFD code based on the Lattice Boltzmann (LB) method. The regular structure of LB algorithms makes them suitable for processor architectures with a large degree of parallelism. However, the challenge of exploiting a large fraction of the theoretically available performance is not easy to met. We consider a state-of-the-art two-dimensional LB model based on 37 populations (a D2Q37 model), that accurately reproduces the thermo-hydrodynamics of a 2D-fluid obeying the equation-of-state of a perfect gas. We describe in details how we implement and optimize our LB code for Xeon-Phi and GPUs, and then analyze performances on single- and multi-accelerator systems. We finally compare results with those available on recent traditional multi-core CPUs

    Nature of the spin-glass phase at experimental length scales

    Full text link
    We present a massive equilibrium simulation of the three-dimensional Ising spin glass at low temperatures. The Janus special-purpose computer has allowed us to equilibrate, using parallel tempering, L=32 lattices down to T=0.64 Tc. We demonstrate the relevance of equilibrium finite-size simulations to understand experimental non-equilibrium spin glasses in the thermodynamical limit by establishing a time-length dictionary. We conclude that non-equilibrium experiments performed on a time scale of one hour can be matched with equilibrium results on L=110 lattices. A detailed investigation of the probability distribution functions of the spin and link overlap, as well as of their correlation functions, shows that Replica Symmetry Breaking is the appropriate theoretical framework for the physically relevant length scales. Besides, we improve over existing methodologies to ensure equilibration in parallel tempering simulations.Comment: 48 pages, 19 postscript figures, 9 tables. Version accepted for publication in the Journal of Statistical Mechanic

    Simulating spin systems on IANUS, an FPGA-based computer

    Get PDF
    We describe the hardwired implementation of algorithms for Monte Carlo simulations of a large class of spin models. We have implemented these algorithms as VHDL codes and we have mapped them onto a dedicated processor based on a large FPGA device. The measured performance on one such processor is comparable to O(100) carefully programmed high-end PCs: it turns out to be even better for some selected spin models. We describe here codes that we are currently executing on the IANUS massively parallel FPGA-based system.Comment: 19 pages, 8 figures; submitted to Computer Physics Communication

    Janus II: a new generation application-driven computer for spin-system simulations

    Get PDF
    This paper describes the architecture, the development and the implementation of Janus II, a new generation application-driven number cruncher optimized for Monte Carlo simulations of spin systems (mainly spin glasses). This domain of computational physics is a recognized grand challenge of high-performance computing: the resources necessary to study in detail theoretical models that can make contact with experimental data are by far beyond those available using commodity computer systems. On the other hand, several specific features of the associated algorithms suggest that unconventional computer architectures, which can be implemented with available electronics technologies, may lead to order of magnitude increases in performance, reducing to acceptable values on human scales the time needed to carry out simulation campaigns that would take centuries on commercially available machines. Janus II is one such machine, recently developed and commissioned, that builds upon and improves on the successful JANUS machine, which has been used for physics since 2008 and is still in operation today. This paper describes in detail the motivations behind the project, the computational requirements, the architecture and the implementation of this new machine and compares its expected performances with those of currently available commercial systems.Comment: 28 pages, 6 figure

    Temperature chaos is present in off-equilibrium spin-glass dynamics

    Get PDF
    Experiments featuring non-equilibrium glassy dynamics under temperature changes still await interpretation. There is a widespread feeling that temperature chaos (an extreme sensitivity of the glass to temperature changes) should play a major role but, up to now, this phenomenon has been investigated solely under equilibrium conditions. In fact, the very existence of a chaotic effect in the non-equilibrium dynamics is yet to be established. In this article, we tackle this problem through a large simulation of the 3D Edwards-Anderson model, carried out on the Janus II supercomputer. We find a dynamic effect that closely parallels equilibrium temperature chaos. This dynamic temperature-chaos effect is spatially heterogeneous to a large degree and turns out to be controlled by the spin-glass coherence length ¿. Indeed, an emerging length-scale ¿* rules the crossover from weak (at ¿ « ¿*) to strong chaos (¿ » ¿*). Extrapolations of ¿* to relevant experimental conditions are provided. © 2021, The Author(s)

    Early Experience on Porting and Running a Lattice Boltzmann Code on the Xeon-Phi Co-Processor

    Get PDF
    In this paper we report on our early experience on porting, optimizing and benchmarking a Lattice Boltzmann (LB) code on the Xeon-Phi co-processor, the first generally available version of the new Many Integrated Core (MIC) architecture, developed by Intel. We consider as a test-bed a state-of-the-art LB model, that accurately reproduces the thermo-hydrodynamics of a 2D- fluid obeying the equations of state of a perfect gas. The regular structure of LB algorithms makes it relatively easy to identify a large degree of available parallelism. However, mapping a large fraction of this parallelism onto this new class of processors is not straightforward. The D2Q37 LB algorithm considered in this paper is an appropriate test-bed for this architecture since the critical computing kernels require high performances both in terms of memory bandwidth for sparse memory access patterns and number crunching capability. We describe our implementation of the code, that builds on previous experience made on other (simpler) many-core processors and GPUs, present benchmark results and measure performances, and finally compare with the results obtained by previous implementations developed on state-of-the-art classic multi-core CPUs and GP-GPUs

    Implementation and Optimization of a Thermal Lattice Boltzmann Algorithm on a multi-GPU cluster

    No full text
    Lattice Boltzmann (LB) methods are widely used today to describe the dynamics of fluids. Key advantages of this approach are the relative ease with which complex physics behavior, e.g. associated to multi-phase flows or irregular boundary conditions can be modeled, and -- from a computational perspective -- the large degree of available parallelism, that can be easily exploited on massively parallel systems. The advent of multi-core and many-core processors, including General Purpose Graphics Processing Unit (GP-GPU), has pushed the quest for parallelization also at the intra-processor level. From this point of view, LB methods may strongly benefit from these new architectures. In this paper we describe the implementation and optimization of a recently proposed thermal LB model -- the so called D2Q37 model -- on multi-GPU systems. We describe in details the ptimization techniques that we have used at both the intra-processor and inter-processor level, present performance and scaling figures and analyze bottlenecks associated to this implementation

    A multi-GPU implementation of a D2Q37 lattice Boltzmann code

    No full text
    We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluster based on Nvidia Fermi processors. We analyze how to optimize the algorithm for GP-GPU architectures, describe the implementation choices that we have adopted and compare our performance results with an implementation optimized for latest generation multi-core CPUs. Our program runs at ˜¿30% of the double-precision peak performance of one GPU and shows almost linear scaling when run on the multi-GPU cluster. Keywords: Computational fluid-dynamics – Lattice Boltzmann methods – GP-GPUs computin
    corecore